Search CORE

17 research outputs found

Approximating ReLU on a Reduced Ring for Efficient MPC-based Private Inference

Author: Maeng Kiwan
Suh G. Edward
Publication venue
Publication date: 09/09/2023
Field of study

Secure multi-party computation (MPC) allows users to offload machine learning inference on untrusted servers without having to share their privacy-sensitive data. Despite their strong security properties, MPC-based private inference has not been widely adopted in the real world due to their high communication overhead. When evaluating ReLU layers, MPC protocols incur a significant amount of communication between the parties, making the end-to-end execution time multiple orders slower than its non-private counterpart. This paper presents HummingBird, an MPC framework that reduces the ReLU communication overhead significantly by using only a subset of the bits to evaluate ReLU on a smaller ring. Based on theoretical analyses, HummingBird identifies bits in the secret share that are not crucial for accuracy and excludes them during ReLU evaluation to reduce communication. With its efficient search engine, HummingBird discards 87--91% of the bits during ReLU and still maintains high accuracy. On a real MPC setup involving multiple servers, HummingBird achieves on average 2.03--2.67x end-to-end speedup without introducing any errors, and up to 8.64x average speedup when some amount of accuracy degradation can be tolerated, due to its up to 8.76x communication reduction

arXiv.org e-Print Archive

Intermittent Computing: Challenges and Opportunities

Author: Balaji Vignesh
Colin Alexei
Lucia Brandon
Maeng Kiwan
Ruppel Emily
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 2nd Summit on Advances in Programming Languages (SNAPL 2017)
Publication date: 01/01/2017
Field of study

The maturation of energy-harvesting technology and ultra-low-power computer systems has led to the advent of intermittently-powered, batteryless devices that operate entirely using energy extracted from their environment. Intermittently operating devices present a rich vein of programming languages research challenges and the purpose of this paper is to illustrate these challenges to the PL research community. To provide depth, this paper includes a survey of the hardware and software design space of intermittent computing platforms. On the foundation of these research challenges and the state of the art in intermittent hardware and software, this paper describes several future PL research directions, emphasizing a connection between intermittence, distributed computing, energy-aware programming and compilation, and approximate computing. We illustrate these connections with a discussion of our ongoing work on programming for intermittence, and on building and simulating intermittent distributed systems

Dagstuhl Research Online Publication Server

Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems

Author: Annavaram Murali
Hashemi Hanieh
Ke Liu
Lee Hsien-Hsin S.
Maeng Kiwan
Suh G. Edward
Xiong Wenjie
Publication venue
Publication date: 12/12/2022
Field of study

Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model size, there was not enough attention paid to the potential information leakage through sparse features. These sparse features are employed to track users' behavior, e.g., their click history, object interactions, etc., potentially carrying each user's private information. Sparse features are represented as learned embedding vectors that are stored in large tables, and personalized recommendation is performed by using a specific user's sparse feature to index through the tables. Even with recently-proposed methods that hides the computation happening in the cloud, an attacker in the cloud may be able to still track the access patterns to the embedding tables. This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns. We first characterize the types of attacks that can be carried out on sparse features in recommendation models in an untrusted cloud, followed by a demonstration of how each of these attacks leads to extracting users' private information or tracking users by their behavior over time

arXiv.org e-Print Archive

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

Author: Brooks David
Gupta Udit
Johnson Jeff
Lai Liangzhen
Lam Maximilian
Lee Hsien-Hsin S.
Leontiadis Ilias
Li Yang
Maeng Kiwan
Reddi Vijay Janapa
Rhu Minsoo
Suh G. Edward
Wei Gu-Yeon
Xiong Wenjie
Publication venue
Publication date: 25/09/2023
Field of study

On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than

20 \times

over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over

5 \times

additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to

100,000

queries per second -- a

>100 \times

throughput improvement over a CPU-based baseline -- while maintaining model accuracy

arXiv.org e-Print Archive

Intermittent asynchronous peripheral operations

Author: Aouda Faycal Ait
Bhatti Naveed
Der Woude Joel Van
Hamed Saad
Maeng Kiwan
Rajkumar Ragunathan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Energy harvesting enables battery-less sensing applications, but causes executions to become intermittent as a result of erratic energy provisioning. Intermittent executions pose challenges to peripheral consistency that threaten to leave peripheral-bound workloads in failed states or to impede forward progress of programs. Intermittent synchronous peripheral operations are supported in existing literature for specific kinds of peripherals. Asynchronous peripheral operations enable reactive concurrency in application implementations, which increases reactivity and improves energy consumption, but lack dedicated support in intermittent settings. We present Karma, the first general abstraction and system design to support both synchronous and asynchronous operations in an intermittent setting. Karma employs a novel combination of pe- ripheral roll-forward and computation roll-back to a rendezvous point guaranteeing consistency. It remains transparent to application programmers and peripheral driver, which favours portability. Our evaluation, based on three applications running on prototype hardware and using diverse energy sources, indicates that intermit- tent asynchronous peripheral support provided by Karma boosts data throughput by 83% compared to existing literature

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Intermittent Computing with Peripherals, Formally Verified

Author: Arreola Alberto Rodriguez
Berthou Gautier
Berthou Gautier
Coburn Joel
Derrick John
Maeng Kiwan
van der Woude Joel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/06/2020
Field of study

International audienceTransiently-powered systems featuring non-volatile memory as well as external peripherals enable the development of new low-power sensor applications. However, as programmers, we are ill-equipped to reason about systems where power failures are the norm rather than the exception. A first challenge consists in being able to capture all the volatile state of the application -- external peripherals included -- to ensure progress. A second, more fundamental, challenge consists in specifying how power failures may interact with peripheral operations. In this paper, we propose a formal specification of intermittent computing with peripherals, an axiomatic model of interrupt-based checkpointing as well as its proof of correctness, machine-checked in the Coq proof assistant. We also illustrate our model with several systems proposed in the literature

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

Efficient lock-free durable sets

Author: David Tudor
Der Woude Joel Van
Harris Timothy L.
Herlihy Maurice
Izraelevitz Joseph
Maeng Kiwan
Nishtala Rajesh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref